Automatic Rule Refinement for Information Extraction

نویسندگان

  • Bin Liu
  • Laura Chiticariu
  • Vivian Chu
  • H. V. Jagadish
  • Frederick Reiss
چکیده

Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to determine the lineage of a tuple in a database, can be leveraged to assist in rule refinement. Specifically, given a set of extraction rules and correct and incorrect extracted data, we have developed a technique to suggest a ranked list of rule modifications that an expert rule specifier can consider. We implemented our technique in the SystemT information extraction system developed at IBM Research – Almaden and experimentally demonstrate its effectiveness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of an Automatic Land Use Extraction System in Urban Areas using VHR Aerial Imagery and GIS Vector Data

Lack of detailed land use (LU) information and efficient data collection methods have made the modeling of urban systems difficult. This study aims to develop a novel hierarchical rule-based LU extraction framework using geographic vector and remotely sensed (RS) data, in order to extract detailed subzonal LU information, residential LU in this study. The LU extraction system is developed to ex...

متن کامل

Integrated-approach-based Automatic Building Extraction

This paper presents an approach to automatic building extraction from large-scale aerial images. It is based on the integration of multiple information, knowledge and appropriate methods, and hierarchically consists of three major parts: building detection, building segment extraction, 3D feature matching and building modelling. Buildings are detected by segmenting DSMs using grey-scale mathema...

متن کامل

Automatic Lane Extraction in Hemoglobin and Serum Protein Electrophoresis Using Image Processing

Image analysis is an image processing technique that aims to extract features or information from images. Image analysis in medicine has a special place because is a basis for disease diagnosis for physicians. Electrophoresis is a laboratory separating technique. Electrophoresis images are created during the electrophoresis process. Serum protein and hemoglobin electrophoresis test are the ...

متن کامل

Automatic Lane Extraction in Hemoglobin and Serum Protein Electrophoresis Using Image Processing

Image analysis is an image processing technique that aims to extract features or information from images. Image analysis in medicine has a special place because is a basis for disease diagnosis for physicians. Electrophoresis is a laboratory separating technique. Electrophoresis images are created during the electrophoresis process. Serum protein and hemoglobin electrophoresis test are the ...

متن کامل

Validation of Mixed-structured Data Using Pattern Mining and Information Extraction

For large-scale data mining utilizing data from ubiquitous and mixed-structured data sources, the appropriate extraction and integration into a comprehensive data-warehouse is of prime importance. Then, appropriate methods for validation and potential refinement are essential. This paper presents an approach applying data mining and information extraction methods for data validation: We apply s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2010